Performance Comparison for Mining Large Data from the Internet and Learning using ID3 Algorithm in a Docker versus Virtual Machine Environment
نویسندگان
چکیده
Every day, 2.5 quintillion bytes of data are generated. A sizeable portion of the data is available through the internet. The efficacy of the decisions being made revolves around the extent to which analysis is performed on the procured data. Containers provide Operating System Virtualization and Linux Containers present secure execution environments by independently executing processes.[1]This paper aims at proving that the performance of Docker Container in mining large data from the internet and learning using ID3 algorithm to generate a decision tree to predict useful results is much better than the performance in a Virtual Machine Environment.
منابع مشابه
A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements
Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majori...
متن کاملEnhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining
This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...
متن کاملA Comparative Study of SVM and RF Methods for Classification of Alteration Zones Using Remotely Sensed Data
Identification and mapping of the significant alterations are the main objectives of the exploration geochemical surveys. The field study is time-consuming and costly to produce the classified maps. Therefore, the processing of remotely sensed data, which provide timely and multi-band (multi-layer) data, can be substituted for the field study. In this study, the ASTER imagery is used for altera...
متن کاملPrediction of Student Learning Styles using Data Mining Techniques
This paper focuses on the prediction of student learning styles using data mining techniques within their institutions. This prediction was aimed at finding out how different learning styles are achieved within learning environments which are specifically influenced by already existing factors. These learning styles, have been affected by different factors that are mainly engraved and found wit...
متن کاملS3PSO: Students’ Performance Prediction Based on Particle Swarm Optimization
Nowadays, new methods are required to take advantage of the rich and extensive gold mine of data given the vast content of data particularly created by educational systems. Data mining algorithms have been used in educational systems especially e-learning systems due to the broad usage of these systems. Providing a model to predict final student results in educational course is a reason for usi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016